Unsupervised Clustering of Utterances Using Non-Parametric Bayesian Methods
نویسندگان
چکیده
Unsupervised clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper newly introduces the infinite HMM, which is also a nonparametric Bayesian method, and verifies its effectiveness. Experimental results in two dialogue domains show that the infinite HMM, which takes into account the sequence of utterances in its clustering process, significantly outperforms the CRP. Although the infinite HMM outperformed other methods, we also found that clustering complex dialogue data, such as humanhuman conversations, is still hard when compared to humanmachine dialogues.
منابع مشابه
Automatic Clustering of Utterances for a Dialogue Act Design
Automatic clustering of utterances can be useful for the modeling of dialogue acts for dialogue applications. Previously, the Chinese restaurant process (CRP), a non-parametric Bayesian method, has been introduced and has shown promising results for the clustering of utterances in dialogue. This paper introduces the infinite HMM, which is also a non-parametric Bayesian method, and verifies its ...
متن کاملComparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing
Zero-resource speech processing (ZS) systems aim to learn structural representations of speech without access to labeled data. A starting point for these systems is the extraction of syllable tokens utilizing the rhythmic structure of a speech signal. Several recent ZS systems have therefore focused on clustering such syllable tokens into linguistically meaningful units. These systems have so f...
متن کاملNon-Parametric Bayesian Human Motion Recognition Using a Single MEMS Tri-Axial Accelerometer
In this paper, we propose a non-parametric clustering method to recognize the number of human motions using features which are obtained from a single microelectromechanical system (MEMS) accelerometer. Since the number of human motions under consideration is not known a priori and because of the unsupervised nature of the proposed technique, there is no need to collect training data for the hum...
متن کاملA sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large scale data
An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of non-parametric Bayesian modeling is implemented with the Markov chain Monte Carlo and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet pr...
متن کاملUnsupervised Modeling of Patient-Level Disease Dynamics
To provide insight into patient-level disease dynamics from data collected at irregular time intervals, this work extends applications of semi-parametric clustering for temporal mining. In the semi-parametric clustering framework, Markovian models provide useful parametric assumptions for modeling temporal dynamics, and a non-parametric method is used to cluster the temporal abstractions instea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011